When will Feature Feedback help? Quantifying the Complexity of Classification Problems
نویسندگان
چکیده
Supervised learning typically requires human effort to label a large number of training instances. Active learning strives to decrease the number of labeled training examples needed by actively engaging the learner and the human in an interactive process. Active learning has proven to be effective in many domains. With few training examples, past work has found that user prior knowledge on the importance of features, or interactive feature feedback, can guide the learner to converge faster, that is, with lower labeling costs. In this paper we aim to understand the kinds of problems for which such extra feedback are significantly beneficial. In other words, we ask what kind of problems can significantly benefit from interactive learning and whether for some problems the user has no choice but to engage in the tedious process of labeling many examples. Towards this goal, we define a set of four difficulty measures, 2 each of instance and feature complexity, for linear classification problems. These measures can efficiently be computed for real world problems for which linear classifiers are effective, such as text classification. We quantify the difficulty of 358 text classification problems and 9 corpora using our measures, illustrating the spectrum of problems that exist in text classification in addition to quantifying results that have only been qualitatively discussed in the text classification literature. We verify the intimate relationship (a high positive correlation) between feature complexity and instance complexity using our measures. We then use these measures to understand when feature feedback is likely to be very useful. We observe that many problems in the commonly used data sets are of low to medium complexity, that is, only roughly 10s of well selected features are required to gain most of the maximum attained performance on such concepts. We find that learning these kinds of problems especially stands to benefit from feature feedback. We note that our empirical difficulty measures and the rankings of problems and domains are of independent interest, beyond the active learning setting.
منابع مشابه
Presentation of quasi-linear piecewise selected models simultaneously with designing of bump-less optimal robust controller for nonlinear vibration control of composite plates
The idea of using quasi-linear piecewise models has been established on the decomposition of complicated nonlinear systems, simultaneously designing with local controllers. Since the proper performance and the final system close loop stability are vital in multi-model controllers designing, the main problem in multi-model controllers is the number of the local models and their position not payi...
متن کاملGrading, no longer an obstacle to learners’ attendance to teacher feedback
Learners are often reported not to be motivated enough to attend to teacher feedback. Teachers also tend to grade learners’ writing samples when providing them with corrective feedback though they know it may divert their attention away from teacher feedback. However, not grading learner writings does not seem to be an option due to both learners’ demands for it and ins...
متن کاملOn the Empirical Complexity of Text Classification Problems
In order to train a classifier that generalizes well, different learning problems, in particular high-dimensional ones such as text classification, can require widely different amounts of training, as measured in terms of the number of training instances required to reach adequate accuracy or the number of features effectively utilized in the classifier. We define several measures of learning d...
متن کاملPerform Three Data Mining Tasks with Crowdsourcing Process
For data mining studies, because of the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. Uncertainty about the performance of virtual user...
متن کاملW4: The Neuro-feedback and Treatment of Attention Problems and Anxiety
There are the different treatments, such as: drugs, psychotherapy, cognitive therapy and behavior therapy, for management of anxiety. But nowadays, an intervention called ”Neuro-feedback” which is combination of electronic, behavior, neurology and pharmacology sciences has been innovated in which the neurons can be growth and reinforced and the brain’s function will be increased. In this interv...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007